Recall that the method of moments involves calculating sample statistics such as the mean and variance of an actual distribution and then fitting this to a known analytic distribution
Example
You have a set of data with a mean of 23 and a standard deviation of 12. You believe the data comes from a lognormal distribution: $lnN(\mu, \sigma^2)$. calculate $\mu$ and $\sigma$ using the method of moments
Price data is not a simple set of values from a distribution.
There are a number of issues we need to be careful about:
How do we solve these problems
We now have data we would expect to fit a normal distribution
We still have to solve the weekend problem
We could
Both assumptions are assumptions and it makes more sense to actually test the validity of either using the data that we have
Essentially we are dealing with a composite process which combines a one day process with a three day process
By viewing the spreadsheet: Fitting volatility to weekly data.xls we observe a very interesting phenomenon
We believe our data comes from the SDE: $\frac{dS_t}{S_t}=\mu dt + \sigma dW_t$
The solution of this is: $S_t = S_0 e^{(\mu-\frac{\sigma^2}{2})t + \sigma W_t}$
Or in other words: $S_{t+\Delta t}= S_t e^{(\mu-\frac{\sigma^2}{2})\Delta t + \sigma W_{\Delta t}}$
$\therefore ln(S_{t+\Delta t}) - ln(S_t) = (\mu-\frac{\sigma^2}{2})\Delta t + \sigma W_{\Delta t}$
And so $ var( ln(S_{t+\Delta t} - ln(S_t)) = \sigma^2 \Delta t$
The Cholesky decomposition allows us to take a set of known correlations between random variables and then construct a matrix which will allow us to recreate these correlations from a set of independent random variables.
In the case of two random variables we know that $X$ and $Y$ are random variables drawn from $N(0,1)$ and have a correlation of $\rho$.
If we are now given a new random variable $X' \sim N(0,1)$ such that $corr(X, X')=0$ then how do we derive $Y$ such that $corr(X,Y)=\rho$
In other words what is the matrix $\begin{bmatrix} a & b \\ c & d\end{bmatrix}$ such that $\begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} X \\ X' \end{bmatrix} = \begin{bmatrix} X \\ Y \end{bmatrix}$
We have $X=aX + bX'$ and $Y=cX + dX' \therefore a=1, b=0$
So $cov(X,Y) = a.c. var(X) + d.b. var(X') + (a.d + b.c) cov (X, X')$
$\therefore cov(X,Y) = a.c + d.b = c$
$\therefore c = \rho$
Given $var (Y) = 1$ we have that $var (cX + dX') = 1$
$\therefore c^2 Var (X) + d^2 var(X') = 1$
$\therefore \rho^2 + d^2 = 1$
$\therefore d=\sqrt{1-\rho^2}$
So our Cholesky decomposition is:
$\begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} X \\ X' \end{bmatrix} $
If we now multiply our matrix by its transpose we observe the following:
$\begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \times \begin{bmatrix} 1 & \rho \\ 0 & \sqrt{1-\rho^2} \end{bmatrix} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$
We notice that we have reconstructed the correlation matrix of the original relationship between the random variables $X$ and $Y$
We can extend this logic to more than 2 random variables
For example for three random variables the Cholesky decomposition is:
$\begin{bmatrix} 1 & \rho_{1,2} & \rho_{1,3} \\ \rho_{1,2} &1& \rho_{2,3} \\ \rho_{1,3} & \rho_{2,3}&1 \end{bmatrix} = \begin{bmatrix} t_{1,1} & 0 & 0 \\ t_{1,2} & t_{2,2} & 0 \\ t_{1,3} & t_{2,3} & t_{3,3} \end{bmatrix} \times \begin{bmatrix} t_{1,1} & t_{1,2} & t_{1,3} \\ 0 & t_{2,2} & t_{2,3} \\0 &0 & t_{3,3} \end{bmatrix}$
Solving gives us:
$\begin{bmatrix} t_{1,1} & 0 & 0 \\ t_{1,2} & t_{2,2} & 0 \\ t_{1,3} & t_{2,3} & t_{3,3} \end{bmatrix} = \begin{bmatrix} 1&0&0 \\ \rho_{1,2} & \sqrt{1-\rho_{1,2}^2}&0\\ \rho_{1,3} & \frac{\rho_{2,3} - \rho_{1,2} \rho_{1,3}}{\sqrt{1-\rho_{1,2}^2}}&\sqrt{1- \rho_{1,3}^2 - \frac{(\rho_{2,3} - \rho_{1,2} \rho_{1,3})^2}{1-\rho_{1,2}^2}} \end{bmatrix}$
This may look complicated but if this system of 9 equations in 9 unknowns reduces naturally one equation at a time and and is very easy to solve.
A numerical algorithm is easy to implement
A correlation matrix between $A$, $B$ and $C$ is given by $\begin{bmatrix} 1 & 0.6 & 0.4 \\ 0.6 & 1 & 0.5 \\ 0.4 & 0.5 & 1 \end{bmatrix}$
Show how you would generate the random varaibles $A$, $B$ and $C$ from uniform independent random variables $U_1$ , $U_2$ and $U_3$
If you wish to fit a function of several parameters to a set of data - this can be a very computationally intensive task
If there is only one parameter then using goal seek in Excel will probably be perfectly serviceable, but you cannot use this method if you have multiple parameters
First you need to decide what is meant by best fit. For most purposes minimising the square of the deviation between the actual data and the fitted data will be a sensible metric, hence the expression "Least squares Regression"
Then we need to decide how to search the vector space of all possible parameter combinations for the one for which the sum of the squares of the deviations is the least
We list below a number of methods in order of increasing sophistication.
We assume:
As per the simple grid search but:
Methods 2 and 3 both require us to be able to minimise along an individual line within our hyperspace
Which ever method we use (apart from the grid) we will need to minimise our function along any particular line in the vector space of the parameters
Line minimisation is analogous to Newton-Rhapson root finding although we are not solving $f(x)=0$ but rather we are solving $f'(x)=0$ i.e. the point which the function (least squares deviation) is at a minimum
For Newton Rhapson the assumption we made was of local linearity - so this time the assumption we make is that the function is locally quadratic
We can use the first and second derivatives if we have them (a vector and matrix respectively) but if not a simple numerical procedure will suffice
calculate $f_0=f(\textbf{P}), f_1+f(\textbf{P}+h\textbf{x}), f_2=f(\textbf{P}+2h\textbf{x})$, where $\textbf{x}$ is the unit vector in parameter space of the direction in which we are minimising and $h$ is a small increment
Then $f''_0 = \frac{(f_2-f_1)-(f_1-f_0)}{h^2}$ is the rate at which the (negative) gradient is getting shallower and so given $f'_0 = \frac{(f_1-f_0)}{h}$ is the current gradient in the direction $\textbf{x}$ then our initial line minimisation guess is $P_{n+1} = P_n - \textbf{x} \times \frac{f'_0}{f''_0}$
We then repeat this process along the same line until we have found our one dimensional minimum - at which point we continue with whatever multi-dimensional method we were using
On first inspection the second and third methods would appear to be quite good methods, but in practice they both tend to zig zag very slowly towards a solution and a much more efficient method is found by the method of conjugate gradients
The conjugate gradient method is as follows:
Step 1: Initialise the set of directions $\textbf{u}_i$ to the basis vectors $\textbf{e}_i$
Step 2: Save your starting position as $\textbf{P}_0$
Step 3: For $i=1,...,N$, move $\textbf{P}_{i-1}$ to the minimum along direction $\textbf{u}_i$ and call this point $\textbf{P}_i$
Step 4: For $i=1,...,N$, set $\textbf{u}_i$ to be $\textbf{u}_{i+1}$
Step 5: Set $\textbf{u}_N$ to be $\textbf{P}_N - \textbf{P}_0$
Step 6: Move $\textbf{P}_N$ to the minimum along the direction $\textbf{u}_N$ and call this point $\textbf{P}_0$
Step 7: Repeat this basic algorithm $N$ times and any quadratic form will be minimised
The result in step 7 was proved by Powell in 1964
This spreadsheet Powell.xls contains an implementation of the method